Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
Add filters

Journal
Document Type
Year range
1.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations ; : 35-42, 2023.
Article in English | Scopus | ID: covidwho-20234954

ABSTRACT

In recent years, COVID-19 has impacted all aspects of human life. As a result, numerous publications relating to this disease have been issued. Due to the massive volume of publications, some retrieval systems have been developed to provide researchers with useful information. In these systems, lexical searching methods are widely used, which raises many issues related to acronyms, synonyms, and rare keywrds. In this paper, we present a hybrid relation retrieval system, CovRelex-SE, based on embeddings to provide high-quality search results. Our system can be accessed through the following URL: https://www.jaist.ac.jp/is/labs/nguyen-lab/systems/covrelex-se/. © 2023 Association for Computational Linguistics.

2.
Technology Application in Tourism in Asia: Innovations, Theories and Practices ; : 295-309, 2022.
Article in English | Scopus | ID: covidwho-2326083

ABSTRACT

Social media has shown to affect tourist activity and spending. However, research related to travel intentions from a large-scale perspective has remained very limited in Indonesia. This research presents an empirical case study using the text mining process on Indonesian domestic tourists' travel intentions to fill in the missing gap. Text classification was used to categorize whether a tweet includes travel intentions or not by concentrating on tourism-related tweet data from Twitter before and after the COVID-19 pandemic. The process of entity recognition was also used to classify the entities in the Tweet. This study showed that the Indonesian intention to travel was 13.08 percent higher than before the pandemic of COVID-19. Moreover, it was also found that interest in adventure activities increased by 581.25 percent and honeymoon trips by 175 percent. Surprisingly, 92 percent of short-stay intentions concluded in this research. However, Indonesian tourists who want to take a long tour are rising by 215.18 percent. This study's findings also show Indonesian tourists' choice to fly to many destinations, such as Bali, the Riau Islands, and Bandung. A more successful Indonesian tourism promotion strategy is expected to develop as a result of this research. Referring to the study findings, it appears that the current model of promotion is relatively distinct from the existing one. The promotional activities that emphasize and focus on 1) sustainable growth, 2) improved productivity, 3) investment innovation and digital transformation, 4) morals, culture, and social responsibility, and 5) technological cooperation has become increasingly important to be incorporated in various programs by The Ministry of Tourism of Indonesia. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022.

3.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 1-10, 2022.
Article in English | Scopus | ID: covidwho-2290872

ABSTRACT

Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags;however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset extracted from the Arabic Newspaper COVID-19 Corpus (AraNPCC). CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure. © 2022 Association for Computational Linguistics.

4.
1st International Conference on Machine Learning, Computer Systems and Security, MLCSS 2022 ; : 301-306, 2022.
Article in English | Scopus | ID: covidwho-2294226

ABSTRACT

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. So we are working on COVID-19 dataset on Omicron variant to recognise the name entity from a given text. We collect the COVID related data from newspaper or from tweets. This article covered the name entity like COVID variant name, organization name and location name, vaccine name. It include tokenisation, POS tagging, Chunking, levelling, editing and for run the program. It will help us to recognise the name entity like where the COVID spread (location) most, which variant spread most (variant name), which vaccine has been given (vaccine name) from huge dataset. In this work, we have identified the names. If we assume unemployment, economic downfall, death, recovery, depression, as a topic we can identify the topic names also, and in which phase it occurred. © 2022 IEEE.

5.
Int J Mol Sci ; 23(23)2022 Nov 29.
Article in English | MEDLINE | ID: covidwho-2296973

ABSTRACT

The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.


Subject(s)
Artificial Intelligence , Data Mining , Data Mining/methods , PubMed , Databases, Factual , Proteins
6.
13th IEEE International Conference on Knowledge Graph, ICKG 2022 ; : 56-63, 2022.
Article in English | Scopus | ID: covidwho-2258490

ABSTRACT

While manual analysis of news coverage is difficult and time consuming, methods in natural language processing can be used to uncover otherwise hidden semantics. This work analyses more than 370,000 news articles to explore connections and trends in business decisions and their financial impact during the COVID-19 pandemic. Topic modelling, sentiment analysis and named entity recognition methods are used to identify connections between the articles and the financial performance of selected companies or industries. This report sets out the results of the individual natural language processing methods and the resulting analysis with financial data. Interesting contrasting topics in the media can be filtered out that are associated with the companies with the highest or lowest positive sentiment. This information could be useful to companies to gain an understanding of topics that are currently treated favourably or unfavourably by the media and hence assist with communication strategies and competitive intelligence. © 2022 IEEE.

7.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 5173-5181, 2022.
Article in English | Scopus | ID: covidwho-2248652

ABSTRACT

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a great source of documented clinical research. Ideally, a clinical expert inspects these articles for exploratory analysis ranging from drug discovery for evaluating the efficacy of existing drugs in tackling emerging diseases to the first test of newly developed drugs. However, more than 100 articles are published daily on a single prevalent disease like COVID-19 in PubMed. As a result, it can take days for a physician to find articles and extract relevant information. Can we develop a system to sift through these articles faster and document the crucial takeaways from each of these articles? In this work, we propose CCS Explorer, an end-to-end system for relevance prediction of sentences, extractive summarization, and patient, outcome, and intervention entity detection from CCS. CCS Explorer is packaged in a web-based graphical user interface where the user can provide any disease name. CCS Explorer then extracts and aggregates all relevant information from articles on PubMed based on the results of an automatically generated query produced on the back-end. For each task, CCS Explorer fine-tunes pre-trained language representation models based on transformers with additional layers. The models are evaluated using two publicly available datasets. CCS Explorer obtains a recall of 80.2%, AUC-ROC of 0.843, and an accuracy of 88.3% on sentence relevance prediction using BioBERT and achieves an average Micro F1-Score of 77.8% on Patient, Intervention, Outcome detection (PIO) using PubMedBERT. Thus, CCS Explorer can reliably extract relevant information to summarize articles, saving time by ~660×. © 2022 IEEE.

8.
Expert Systems with Applications ; 223, 2023.
Article in English | Scopus | ID: covidwho-2263399

ABSTRACT

Because of the frequent occurrence of chronic diseases, the COVID-19 pandemic, etc., online health expert question-answering (HQA) services have been unable to cope with the rapidly increasing demand for online consultations. Building a virtual health assistant based on medical named entity recognition (NER) can effectively assist with the consultation process, but the unstandardized expressions within HQA text pose a serious challenge for medical NER tasks. The main goal of this study is to propose a novel deep medical NER approach based on a collaborative decision strategy (CDS), i.e., co_decision_NER (CDN), that can identify standard and nonstandard medical entities in the HQA context. We collected 10,000 question–answer pairs from HaoDF, extracted medical entities from 15 entity categories, and used a CDS to fuse the advantages of different NER models. Ultimately, CDN achieved a performance (precision = 84.50%, recall = 84.30%, F1 = 84.40%) that was significantly better than that of the state-of-the-art (SOTA) method. Our empirical analysis suggests that the entity types Disease (DIS), Sign (SIG), Test (TES), Drug (DRU), Surgery (SUR), Precaution (PRE), and Region (REG) can be most easily expressed arbitrarily in the doctor–patient interaction scenario of HQA services. In addition, CDN can identify not only standard but also nonstandard medical entities, effectively alleviating the severe out-of-vocabulary (OOV) problem faced by HQA services when performing medical NER tasks. The core contribution of this study is the development of a novel neural network model fusion algorithm that can improve the performance of entity recognition in medical domain-specific tasks. © 2023 Elsevier Ltd

9.
IET Cyber-Physical Systems: Theory and Applications ; 2023.
Article in English | Scopus | ID: covidwho-2244409

ABSTRACT

With the rapid development of biomedical research and information technology, the number of clinical medical literature has increased exponentially. At present, COVID-19 clinical text research has some problems, such as lack of corpus and poor annotation quality. In clinical medical literature, there are many medical related semantic relationships between entities. After the task of entity recognition, how to further extract the relationships between entities efficiently and accurately becomes very critical. In this study, a COVID-19 clinical trial data relationship extraction model based on deep learning method is proposed. The model adopts MPNet model, bidirectional-GRU (BiGRU) network, MAtt mechanism and Conditional Random Field inference layer integration architecture and improves the problem that static word vector cannot represent ambiguity through pre-trained language model. BiGRU network is used to replace the current Bi directional long short term memory structure and simplify the network structure of Long Short Term Memory to improve the training efficiency of the model. Through comparative experiments, the proposed method performs well in the COVID-19 clinical text entity relation extraction task. © 2023 The Authors. IET Cyber-Physical Systems: Theory & Applications published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

10.
J Ambient Intell Humaniz Comput ; : 1-15, 2021 Jun 10.
Article in English | MEDLINE | ID: covidwho-2243986

ABSTRACT

Real-time data processing and distributed messaging are problems that have been worked on for a long time. As the amount of spatial data being produced has increased, coupled with increasingly complex software solutions being developed, there is a need for platforms that address these needs. In this paper, we present a distributed and light streaming system for combating pandemics and give a case study on spatial analysis of the COVID-19 geo-tagged Twitter dataset. In this system, three of the major components are the translation of tweets matching with user-defined bounding boxes, name entity recognition in tweets, and skyline queries. Apache Pulsar addresses all these components in this paper. With the proposed system, end-users have the capability of getting COVID-19 related information within foreign regions, filtering/searching location, organization, person, and miscellaneous based tweets, and performing skyline based queries. The evaluation of the proposed system is done based on certain characteristics and performance metrics. The study differs greatly from other studies in terms of using distributed computing and big data technologies on spatial data to combat COVID-19. It is concluded that Pulsar is designed to handle large amounts of long-term on disk persistence.

11.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2230986

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

12.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2223158

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

13.
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022 ; : 2274-2280, 2022.
Article in English | Scopus | ID: covidwho-2223066

ABSTRACT

Toward efficient learning of massive publications during the COVID-19 pandemic, we propose a pipeline, Knowledge Extraction for COVID-19 Publications (KEP), that aims at automatic extraction and representation of key knowledge from user-interested publications. The first version, KEP-1.0, has been developed and published on the Python Package Index (PyPI) (URL: https://pypi.org/project/KEP/). In this first release, knowledge about key topics, disease discussions, and location mentions for each publication is provided. KEP-1.0 not only extracts relevant knowledge but, more importantly, emphasizes the top discussed entities and presents visualizable plots, including bar graphs and word clouds. This allows a rapid preliminary understanding of the main discussions in the publication from these three aspects. Moreover, an enhanced TF-IDF algorithm, the weighted TF-IDF, targeting the publication topic identification purpose, has been proposed and evaluated. The pipeline is fully open-sourced and customizable. KEP-1.0 is ready for use in its current form or to be embedded into existing literature platforms. This pipeline is designed for COVID-related publications, but it has the potential to benefit similar knowledge extraction tasks for other topics of interest with a rapidly increasing number of publications. © 2022 IEEE.

14.
Procesamiento Del Lenguaje Natural ; - (69):165-176, 2022.
Article in English | Web of Science | ID: covidwho-2218007

ABSTRACT

Several initiatives have emerged during the COVID-19 pandemic to gather scientific publications related to coronaviruses. Among them, the COVID-19 Open Research Dataset (CORD-19) has proven to be a valuable resource that provides full-text articles from the PubMed Central, bioRxiv and medRxiv repositories. Such a large amount of biomedical literature needs to be properly managed to facilitate and promote its use by health professionals, for example by tagging documents with the biomedical entities that appear on them. We created a biomedical named entity recognizer (NER) that normalizes (NEN) the drugs, diseases, genes and proteins mentioned in texts with the codes of the main standardization systems such as MeSH, ICD-10, ATC, SNOMED, ChEBI, GARD and NCBI. It is based on fine-tuning the BioBERT language model independently for each entity type using domain-specific datasets and an inverse index search to normalize the references. We have used the resultant BioNER+BioNEN system to process the CORD-19 corpus and offer an overview of the drugs, diseases, genes and proteins related to coronaviruses in the last fifty years.

15.
13th International Conference on Computing Communication and Networking Technologies, ICCCNT 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2213233

ABSTRACT

Health literacy is the ability of a person to read and understand medical text and to use that information to make informed healthcare decisions. Unfortunately, medical articles are difficult to comprehend by common people as they use complex language and domain-specific terms. Improving health literacy is important for empowering communities against emerging threats and the COVID-19 pandemic bears testimony to this statement. One way to improve health literacy is easing access to complex healthcare information by summarising medical texts and simplifying them lexically by translating specific medical terminology to laymen's terms. In this paper we propose a system that performs extractive summarization on the medical article given as input followed by named entity recognition for identifying medical terms. The meanings of identified medical entities are then found through web scraping and displayed to the user along with the summary. We have experimented with state-of-the-art summarization models and Albert (A lite BERT) has provided the best ROUGE-1 score of 0.3789 and ROUGE-L of 0.2084. © 2022 IEEE.

16.
7th China National Conference on Big Data and Social Computing, BDSC 2022 ; 1640 CCIS:374-388, 2022.
Article in English | Scopus | ID: covidwho-2173952

ABSTRACT

With the global outbreak of virulent infectious diseases such as COVID-19, the lack of medical resources has become a serious social problem. More and more online users who encounter physical discomfort will first choose to look up the relevant symptoms on the Internet. Some online platforms have been able to understand the symptoms entered by users and provide auxiliary diagnosis suggestions. However, the professionalism and accuracy of online health and medical Question Answering (QA) systems are very insufficient. How to obtain and utilize massive medical symptom data from multiple data sources such as the Internet, medical symptom libraries and medical professional electronic books has become a difficult problem. The development of big data, artificial intelligence, and especially knowledge graph technology has provided ideas and methods to solve this problem. In this paper, a medical knowledge graph NWNU-KG is constructed on the basis of multi-source data, and the BiLSTM-CRF-CNN-Dict (BCCD) joint model is used for entity recognition and relationship extraction of user-asked questions to implement a healthcare knowledge QA system. Numerous experiments have found that the joint model BCCD proposed in this paper has a higher accuracy rate compared with the best available models, and can filter the answers to questions and return them to users in multi-source, heterogeneous and massive healthcare data, which has some practical value. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

17.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 13610 LNCS:23-32, 2022.
Article in English | Scopus | ID: covidwho-2173854

ABSTRACT

Biomedical named entity recognition is becoming increasingly important to biomedical research due to a proliferation of articles and also due to the current pandemic disease. This paper addresses the task of automatically finding and recognizing biomedical entity types related to COVID (e.g., virus, cell, therapeutic) with tolerance rough sets. The task includes i) extracting nouns and their co-occurring contextual patterns from a large BioNER dataset related to COVID-19 and, ii) annotating unlabelled data with a semi-supervised learning algorithm using co-occurence statistics. 465,250 noun phrases and 6,222,196 contextual patterns were extracted from 29,500 articles using natural language text processing methods. Three categories were successfully classified at this time: virus, cell and therapeutic. Early precision@N results demonstrate that our proposed tolerant pattern learner (TPL) is able to constrain concept drift in all 3 categories during the iterative learning process. © 2022, Springer-Verlag GmbH Germany, part of Springer Nature.

18.
19th International Conference on Web Information Systems and Applications, WISA 2022 ; 13579 LNCS:267-279, 2022.
Article in English | Scopus | ID: covidwho-2173751

ABSTRACT

Since the outbreak of the COVID-19 epidemic at the end of 2019, the normalization of epidemic prevention and control has become one of the core tasks of the entire country. Health self-examination by checking the trajectory of diagnosed patients has gradually become everyone's basic necessity and essential to epidemic prevention. The COVID-19 patient's spatio-temporal information helps to facilitate the self-inspection of the masses of whether their trajectory overlaps with the confirmed cases, which promotes the epidemic prevention work. This paper, proposes a named entity recognition model to automatically identify the time and place information in the COVID-19 patient trajectory text. The model consists of an ALBERT layer, a Bi-GRU layer, and a GlobalPointer layer. The previous two layers jointly focus on extracting the context's characteristics and the semantic dependencies. And the GlobalPointer layer extracts the corresponding named entities from a global perspective, which improves the recognition ability for the long-nested place and time entities. Compared to the conventional name entity recognition models, our proposed model has high effectiveness because it has a smaller parameter scale and faster training speed. We evaluate the proposed model using a dataset crawled from the official COVID-19 trajectory text. The F1-score of the model has reached 92.86%, which outperforms four traditional named entity recognition models. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

19.
4th International Conference on Information Systems and Management Science, ISMS 2021 ; 521 LNNS:419-427, 2023.
Article in English | Scopus | ID: covidwho-2173622

ABSTRACT

Entity extraction from the text data in the biomedical domain has an essential role in biomedical research. In natural language processing entity extraction task aims to identify the terms into predefined categories. With the emergence of the covid-19, covid related digital resources increased drastically and the new type of entities is introduced. State-of-the-art named entity extraction models is heavily relying on domain-specific resources which are hard to perform adequately on covid related data. In this paper, we proposed a deep-learning-based architecture for named entity recognition. The experiment was performed on the CORD-NER dataset which was released by the University of Illinois. We compare the performance of different deep learning-based architectures on this data for a named entity recognition task. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

20.
Comput Biol Chem ; 102: 107808, 2023 Feb.
Article in English | MEDLINE | ID: covidwho-2165189

ABSTRACT

The number of biomedical articles published is increasing rapidly over the years. Currently there are about 30 million articles in PubMed and over 25 million mentions in Medline. Among these fundamentals, Biomedical Named Entity Recognition (BioNER) and Biomedical Relation Extraction (BioRE) are the most essential in analysing the literature. In the biomedical domain, Knowledge Graph is used to visualize the relationships between various entities such as proteins, chemicals and diseases. Scientific publications have increased dramatically as a result of the search for treatments and potential cures for the new Coronavirus, but efficiently analysing, integrating, and utilising related sources of information remains a difficulty. In order to effectively combat the disease during pandemics like COVID-19, literature must be used quickly and effectively. In this paper, we introduced a fully automated framework consists of BERT-BiLSTM, Knowledge graph, and Representation Learning model to extract the top diseases, chemicals, and proteins related to COVID-19 from the literature. The proposed framework uses Named Entity Recognition models for disease recognition, chemical recognition, and protein recognition. Then the system uses the Chemical - Disease Relation Extraction and Chemical - Protein Relation Extraction models. And the system extracts the entities and relations from the CORD-19 dataset using the models. The system then creates a Knowledge Graph for the extracted relations and entities. The system performs Representation Learning on this KG to get the embeddings of all entities and get the top related diseases, chemicals, and proteins with respect to COVID-19.


Subject(s)
COVID-19 , Pattern Recognition, Automated , Humans , Data Mining/methods
SELECTION OF CITATIONS
SEARCH DETAIL